Towards Effective Use of Training Data in Statistical Machine Translation

نویسندگان

Philipp Koehn

Barry Haddow

چکیده

We report on findings of exploiting large data sets for translation modeling, language modeling and tuning for the development of competitive machine translation systems for eight language pairs.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

Co-training for Statistical Machine Translation

I propose a novel co-training method for statistical machine translation. As co-training requires multiple learners trained on views of the data which are disjoint and sufficient for the labeling task, I use multiple source documents as views on translation. Co-training for statistical machine translation is therefore a type of multi-source translation. Unlike previous mutli-source methods, it ...

متن کامل

Enriching Parallel Corpora for Statistical Machine Translation with Semantic Negation Rephrasing

This paper presents an approach to improving performance of statistical machine translation by automatically creating new training data for difficult to translate phenomena. In particular this contribution is targeted towards tackling the poor performance of a state-of-the-art system on negated sentences. The corpus expansion is achieved by high quality rephrasing of existing sentences to their...

متن کامل

The Effectiveness of Moral Intelligence Training on Students' Attitudes towards Substance

The current research aimed to investigate the effectiveness of moral intelligence training on changing male students’ attitudes towards substance. This study was a quasi-experimental research in pretest-posttest design with control group. The statistical population of the current research consisted of all male students of the Holy Prophet Farhangian University in the city of Ahvaz in the 2017-2...

متن کامل

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

Machine Translation Evaluation Metrics (MTEMs) are the central core of Machine Translation (MT) engines as they are developed based on frequent evaluation. Although MTEMs are widespread today, their validity and quality for many languages is still under question. The aim of this research study was to examine the validity and assess the quality of MTEMs from Lexical Similarity set on machine tra...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Towards Effective Use of Training Data in Statistical Machine Translation

نویسندگان

چکیده

منابع مشابه

A new model for persian multi-part words edition based on statistical machine translation

Co-training for Statistical Machine Translation

Enriching Parallel Corpora for Statistical Machine Translation with Semantic Negation Rephrasing

The Effectiveness of Moral Intelligence Training on Students' Attitudes towards Substance

The Correlation of Machine Translation Evaluation Metrics with Human Judgement on Persian Language

عنوان ژورنال:

اشتراک گذاری